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Abstract 

We present a reformulation of stochastic global optimization as a fil- 
tering problem. The motivation behind this reformulation comes from 
the fact that for many optimization problems we cannot evaluate ex- 
actly the objective function to be optimized. Similarly, we may not be 
able to evaluate exactly the functions involved in iterative optimization 
algorithms. For example, we may only have access to noisy measure- 
ments of the functions or statistical estimates provided through Monte 
Carlo sampling. This makes iterative optimization algorithms behave 
like stochastic maps. Naive global optimization amounts to evolving a 
collection of realizations of this stochastic map and picking the realiza- 
tion with the best properties. This motivates the use of filtering tech- 
niques to allow focusing on realizations that are more promising than 
others. In particular, we present a filtering reformulation of global op- 
timization in terms of a special case of sequential importance sampling 
methods called particle filters. The increasing popularity of particle 
filters is based on the simplicity of their implementation and their flex- 
ibility. For parametric exponential density estimation problems, we 
utilize the flexibility of particle filters to construct a stochastic global 
optimization algorithm which converges to the optimal solution ap- 
preciably faster than naive global optimization. Several examples are 
provided to demonstrate the efficiency of the approach. 



Introduction 

Optimization problems are ubiquitous in science and engineering ranging 
from portfolio optimization to space craft trajectory design to DNA stud- 
ies and computer science [HI [161 E]- As a result of the vast number of 
optimization applications and their related intricacies, the construction of 
optimization algorithms continues to be a subject of intense research. The 
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main problem in optimization is to locate, usually through an iterative pro- 
cedure, the optimal values (minima or maxima) of an objective function 
which depends on a number of parameters. The related problem of estimat- 
ing the values of the parameters for which the objective function attains its 
optimal value is of equal importance in applications. 

For many optimization problems the function to be optimized is a func- 
tion of several variables and can have multiple local extrema. Usually, opti- 
mization algorithms are guaranteed to find a local extremum when started 
from an arbitrary initial condition. In order to reach a global extremum 
we should use an optimization algorithm with multiple starting values and 
choose the best solution. This is called naive global optimization. In naive 
global optimization each solution starting from a different initial condition 
evolves independently of the others. Intuitively, we expect that a global 
optimization algorithm should benefit from the interaction of the different 
solutions. There is a category of global (deterministic and stochastic) opti- 
mization algorithms (e.g. branch and bound, tabu methods, genetic algo- 
rithms, ant colony optimization, swarm optimization, simulated annealing, 
hill climbing) [6l [19] which allow for the different solutions to interact 
with the purpose of allocating more resources in areas of the parameter 
space which seem more promising in an optimization sense. 

In the current work, we propose an algorithm that belongs in this cat- 
egory. For many optimization problems we cannot evaluate exactly the 
objective function to be optimized. Similarly, we may not be able to evalu- 
ate exactly the functions involved in iterative optimization algorithms. For 
example, we may only have access to noisy measurements of the functions or 
statistical estimates provided through Monte Carlo sampling. This makes 
iterative optimization algorithms behave like stochastic maps. The corre- 
sponding global optimization problem also becomes stochastic. The algo- 
rithm we present here is based on the reformulation of the stochastic global 
optimization problem as a filtering problem. In particular, we present a 
filtering reformulation of stochastic global optimization in terms of a spe- 
cial case of sequential importance sampling methods called particle filters 
[HQS]. I n this setting, the desired interaction of the solutions starting from 
different initial conditions is effected through the filtering step (see Section 
[2] for more details). 

The increasing popularity of particle filters is based on the simplicity of 
their implementation and their flexibility. The generic particle filter refor- 
mulation will not necessarily lead to an algorithm that converges faster than 
the naive global optimization algorithm. However, we are able to exploit the 
flexibility of the particle filter reformulation to construct a modified particle 
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filter algorithm that converges appreciably faster than the naive global opti- 
mization algorithm. We provide several examples of parametric exponential 
density estimation of varying difficulty to demonstrate the efficiency of the 
approach. 

The paper is organized as follows. Section [T] discusses local and stochas- 
tic global optimization problems. In Section [2] we present the reformulation 
of stochastic optimization problem as a filtering problem and in particular a 
reformulation which uses particle filters. In Section [3] we apply the particle 
filter reformulation to the problem of estimating the parameters of an expo- 
nential density. Section [5] contains numerical results for several examples to 
illustrate the efficiency of the proposed approach. Finally, in Section [5] we 
provide a discussion and some directions for future work. 

1 Stochastic local and global optimization 

Assume that we are given a scalar objective function H (x) depending on d 
parameters x%, . . . , xj. The purpose of the optimization problem is to com- 
pute the optimal (maximal or minimal) value of H(x), say op(H(x)), as 
well as the optimal parameter vector x = (xi,...,Xd) for which H(x) = 
op(H(x)). For the sake of clarity and without loss of generality, let us as- 
sume that we are interested in obtaining the minimum value of the objective 
function H(x), 

A generic optimization algorithm attempts to locate the optimal param- 
eter vector in the following way: 

Generic optimization algorithm 

1. Pick an initial approximation x^°\ 

2. For k > 0, compute = f(x^ k >), where f(x) = (fi(x), . . . , fd(x)) 
is a (i-dimensional vector- valued function. Different optimization al- 
gorithms use a different function f(x). 

3. Evaluate H(x^ k+1 ^). If H(x^ k+1 ^) satisfies a stopping criterion or k + 1 
is equal to a maximum number of iterations k max , stop. Else, set 
k = k + 1 and proceed to Step 2. 

For many optimization problems, the function H(x) has multiple local 
minima. The generic algorithm described above is, usually, able to locate 
one of the local minima, unless we are very lucky or know a lot about the 
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problem at hand to guide the choice of x^ ^. An obvious global version of 
the algorithm given above is as follows: 



Naive global optimization algorithm 

1. Pick a collection of initial approximations . . . , x$ . 

2. Run, say for M iterations, the generic optimization algorithm for the 
different initial conditions xf*\ . . . , x$ . 

3. Choose x = arg min H{x\ M ^). 

i=i...N 1 

As stated in the introduction, for many optimization problems we only have 
access to a random estimate of the value of the objective function H and 
of the optimization algorithm function /. We will denote those as H s and 
f s respectively. The uncertainty can come from noisy measurements. Also, 
it can be due to Monte Carlo sampling error for cases where H and/or / 
involve expectations with respect to a probability density. Consequently, 
the iterative optimization sequence becomes stochastic. To be more precise, 
the optimization algorithm is modified as follows: 



Stochastic optimization algorithm 

1. Pick an initial approximation x^°\ 

2. For k > 0, compute x^ k+1 ^ = f s (x^). 

3. Evaluate H s (x^ k+1 ^). If the value H s (x ( - k+1 ^) satisfies a stopping crite- 
rion or k + 1 is equal to a maximum number of iterations k max , stop. 
Else, set k = k + 1 and proceed to Step 2. 

Of course, there is an obvious stochastic version of the naive global opti- 
mization algorithm: 



Naive stochastic global optimization algorithm 

1. Pick a collection of initial approximations x^\ . . . , x$ . 

2. Run iV realizations of the stochastic optimization algorithm, one for 
each of the different initial conditions xf\ . . . ,x$, say for M itera- 
tions. 

3. Choose x = arg min H s (x^ M ^). 
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2 Filtering reformulation of stochastic optimiza- 
tion 



In the naive stochastic global optimization algorithm each realization of 
the stochastic optimization algorithm evolves independently of the other 
realizations. As it happens in many optimization problems, most realizations 
of a stochastic optimization algorithm will not start from an initial condition 
near a local minimum and thus will not contribute much to the exploration 
of minima in the space of parameter vectors. We would like to have a way 
to allocate more realizations in areas of the parameter vector space which 
seem more promising in an optimization sense. We propose to do that by 
treating the stochastic global optimization problem as a filtering problem. 

The filtering problem consists of the problem of incorporating the infor- 
mation from noisy observations of a system (noisy measurements of some 
function of the system) to correct the evolution trajectory of a system. The 
evolved system is usually stochastic and, in the case of discrete time formula- 
tions, a stochastic map. In the stochastic optimization algorithm context, we 
can think of: i) the iterative optimization algorithm, i.e. x^ k+1 ^ = f s (x^), 
as the stochastic map and ii) the evaluation of the objective function, i.e. 
H s (x^ k+1 ^), as the noisy observation of the system. This allows us to recast 
(reformulate) the stochastic optimization problem as a filtering problem. 

To make the reformulation more transparent we begin by recalling the 
abstract formulation of the filtering problem. Let k be a discrete time in- 
dex. Consider the following Markovian model which is motivated by the 
stochastic optimization algorithm: 



x (W) =/ s (#), (1) 
y (fc+i) =jffa ( a; (fc+i)) + u (*+i) i (2) 

where is a random variable with known properties. We have intro- 

duced the random variable v to facilitate the formulation of the particle 
filter (see Section I27T]) . Under appropriate regularity assumptions [3], we can 
associate with ([1]) a state evolution density h(x^ k+ ^\x^ k ') and with ([2]) an 
observation density g(y^ k+1 ^\x^ k+1 ^), i.e. 

~ h(x( k+r >\xW), (3) 

I/ (*+l)~^(y(*+D| X ( fc + 1 )). (4) 

Before we proceed with the details of the filtering problem we need to ad- 
dress the issue of the values of the observation sequence y^ k K In filtering 
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problems, the observations are provided by some external measurement of 
the system. In the context of the stochastic optimization algorithm, there 
is no external measurement of the system. However, because we are dealing 
with an optimization problem (in our case minimization), the observation 
sequence j/W is by construction a non-increasing sequence (in k). If we 
are looking for a maximum, then the sequence y^ will be a non-decreasing 
sequence. 

The filtering problem for the Markovian model ([I])-© is to compute, for 
any time t, the posterior distribution p{x^' t ^\y <yl:t ^) and/or the marginal dis- 
tribution p(x^' ) , where x^'^ = (a;( ',...,a;«) and j/( 1:t ) = (y W , . . . , y (*) ] 
With the help of Bayes' theorem the posterior distribution p(x^ 0:t ^ l^ 1 ^) can 
be written as (see e.g. [5]) 

P{ W 1 /p(y( 1: *)|x( 0: t))p(x( 0: *))dx( 0; *)' 
The posterior distribution can be computed recursively by the formula 

P (x \y )-p[x \y ) 

The marginal distribution p(a;W|yv 1:t )) satisfies the recursion: 

Prediction : p^V 1 ^) = f p(x® \x^)p{x^ \y^- x ))dx^ , 



(5) 

tfuWe : P^\y^) = J^^^^y (6) 

The recursive formulas for the posterior and the marginal distribution can 
rarely be computed analytically, since for practical applications they involve 
the evaluation of complex high-dimensional integrals. Thus, we need to 
compute approximations to these distributions which should converge in 
some limit to the exact distributions. Sequential importance sampling (SIS) 
methods [15] have emerged as a flexible and powerful framework to construct 
such approximations. We should note here that SIS methods constitute 
a general framework for importance sampling, which is not restricted to 
filtering problems (for a nice presentation of the SIS framework in its full 
generality see Ch. 3 in [15]). 

We are now in a position to present the filtering reformulation of the 
stochastic optimization problem. It consists of solving the filtering problem 
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for 

x (k + i) = /s(x ( fc))) 

where is a non-increasing sequence. We see that the stochastic op- 

timization problem corresponds to a special case of the filtering problem 
since it includes the additional constraint of the non-increasing sequence of 
observations. 

2.1 Particle filter reformulation of stochastic global opti- 
mization 

The approximation we will use to solve the filtering problem described 
above is a special case of the SIS formalism known as a particle filter 
(or bootstrap filter) [8]. Assume (recall Eqs. ©-([I])) that we are given 
a model for the state evolution density h(x^ k+1 ^\x^) and the observa- 
tion density g(y( fc+1 )|2;( fe+1 )). In particular, let x^ k+l ^ ~ h(x^ k+1 ^\x^) and 
y( k + 1 ) g(y( fc+1 )|x( fc+1 )). Suppose that at time t we have a collection of 

M random samples (particles) (xf\...,x^J) which follow approximately 
the current marginal distribution p(x^ The paper [5] suggests the 

following updating procedure after is observed: 

Particle filter algorithm 

1. Draw from the state density h(x^ t+1 ^ \xj), for j = 1, ... ,M. 

2. Assign to each draw the weight (also known as likelihood) w* = 
5(y (m) l< +1) ),fori = l,...,M. 

w* 

3. Compute the normalized weights Wj = =xr — », for j = 1, . . . , M. 

4. Resample from (x^ , ...,x^ ) with probability proportional to 
Wj, j = 1, . . . , M to produce new samples (xf +1 \ . . . , x^~ ^) at time 
t + 1. 

5. Set t = t + 1 and proceed to Step 1. 

The particle filter approximates the marginal distribution p(x^ |y( 1: *) ) by the 
distribution p(x^\y( 1:t ^) = 4r YliLi $ (*)• ^ can be shown [4], under appro- 

priate regularity assumptions on the state evolution density h(x^ k+1 ^\x^) 



7 



and the observation density g(y^ k+1 ^\x^ k+1 ^), that lim p(x® \y ( - 1 ' €) ) = p(x^\y^) 
almost surely. 

The particle filter construction appears as a natural candidate for the 
reformulation of the global stochastic optimization problem as a filtering 
problem. If one thinks of the particles in the particle filter construction as 
different initial conditions for the stochastic global optimization algorithm, 
then the particle filter becomes the filter analog of stochastic global opti- 
mization. Following the notation of the particle filter algorithm, we can 
specify the value of the observation at step k as y^ = min HJx^i)). 

j=l,...,M * ] 

With this choice for y^ k \ the particle filter formulation of the stochastic 
global optimization problem becomes 

Particle filter algorithm for stochastic global optimization 

1. Draw samples x± \ . . . , x$ from an initial density fM)(x). Set t = 0. 

2. Draw samples x^ 1 ^ from the state density h{x^ +1 \x^p), for j = 
1,...,M. 

3. Compute H s (x { ^ +1) ) for j = 1, . . . , M. 

4. Sety(* +1 )= min H s {x^ 1] ). 

j=l,...,M J 

5. Compute the weights u>*(4* +1) ) = g{y {t+1) \x { ^ +1) ), for j = 1,...,M. 

6. Compute the normalized weights Wj{x*^) = M J — -, for j = 1, . . . , M. 

7. Choose the estimate xt+i = ar g m & x Wj(x^^). Equivalently, we can 

j=l...M J 

choose xt+i = ar g mm H s (x^^). 

j=l,...,M J 

8. If t + 1 is equal to a maximum allowed number of iterations t max or 

satisfies a stopping criterion, terminate the algorithm. Else, 
proceed to next step. 

9. Resample from . . . , x^^ ) with probability proportional to 
Wj, j = 1, . . . , M to produce new samples (xf +1 \ . . . , x^ ^) at time 
t+1. 

10. Set t = t + 1 and proceed to Step 2. 
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The algorithm above computes a sequence of estimates xt, t = 1, ... for 
the solution of the stochastic global optimization problem. Note that in 
Step 7, where we choose the estimate for the solution of the optimization 
problem, we provide two equivalent ways of determining the estimate. It 
is the same whether one chooses the particle with the smallest value of the 
objective function or the largest weight. This duality between minimizing 
the objective function and maximizing the weight (likelihood) of the obser- 
vation of a filtering problem is not specific to this problem. The equivalence 
between least-squares problems and maximum likelihood estimation is well 
known in optimization literature (see e.g. [IS]). Also, one can associate a 
filtering problem to a deterministic (Maslov) optimization process defined 
on a performance space (see the appendix by Del Moral in |12j). The duality 
between Maslov processes and filtering problems relies on idempotent anal- 
ysis and, in particular the Log-Exp transform [12] . The Log-Exp transform 
correspondence is present also in our particle filter reformulation of global 
optimization if we assume that the random variable in the observation 

process is Gaussianly distributed (see Section |3?2|) . 

The particle filter reformulation of global optimization does not suffer 
from any convergence drawbacks compared to the naive global optimiza- 
tion. To see this, we can assume for a moment that there is no observation 
step. Then, the particle filter algorithm reduces to naive global optimiza- 
tion. Naive global optimization will converge to a global minimum given 
an adequate number of particles. What the particle filter adds to the naive 
optimization is to pick, after every iteration, the solution that seems more 
promising in an optimization sense. Since we have chosen the value of the ob- 
servation to be equal to the minimum of the values of the objective function 
over all particles, the sequence of values picked for the objective function 
is guaranteed to be non-increasing. At worst, the value of the objective 
function will not change between iterations if the particles attempt moves 
in the parameter space that increase the value of the objective function. 
This can happen, especially for high-dimensional problems, where the local 
minima can be located in narrow valleys of the objective function [16] . The 
way we have defined the value of the observation ensures convergence to at 
least a local minimum. If, in addition, we have enough particles to explore 
in detail the parameter space, the particle filter will converge to the global 
minimum. However, the particle filter algorithm will not necessarily con- 
verge faster than the naive optimization algorithm. Yet, the reason we are 
presenting the particle filter reformulation of global optimization is because 
of its inherent flexibility. This means that starting from the generic form 
of the particle filter prescribed above, we can derive modified particle filters 
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that exploit the specifics of the optimization algorithm used in the pre- 
diction step (e.g. Robbins-Monro, Kiefer-Wolfowitz, Levenberg-Marquardt 
algorithm etc.). This flexibility is exploited in Section [3.2.11 to construct a 
modified particle filter algorithm with faster convergence than the related 
naive optimization algorithm. 

3 Application to parametric exponential density 
estimation 

We apply the particle filter reformulation of stochastic global optimization 
to the problem of parametric density estimation for exponential densities. 
Exponential densities are, partly due to their nice mathematical features, 
widely used in the modeling of densities of systems of interacting variables 
in different contexts, ranging from Hamiltonian systems to image process- 
ing and bioinformatics (see [10] and references therein). As a result, there 
is an increased interest in algorithms for estimating and manipulating such 
densities numerically. In [18] we presented an algorithm based on maximum 
likelihood for the estimation and renormalization (marginalization) of expo- 
nential densities. In the next section we present the main ideas of the work 
in [18], in particular, how maximum likelihood estimation for exponential 
densities leads to an optimization problem. 

3.1 Parametric exponential density estimation as an opti- 
mization problem 

We begin our presentation with a few facts about families of exponential 
densities (for more details see e.g. [10]). Let x = (xi,...,x n ) be an 
n-dimensional random vector taking values in H n C R n . The set H n = 
5i X H 2 X . . . H n , where x\ 6 Hi, . . . , x n € H n . Also, let ^fc(x), k = 1, . . . , I 
be a collection of functions of x. The functions ipk are known as poten- 
tials or sufficient statistics. Let ip = (ipi, . . . ,ipi) be the vector of potential 
functions. Associated with the vector ip is a vector a = (ax, . . . ,ai) whose 
elements are called canonical or exponential parameters. The exponential 
family associated with -0 is the collection of density functions (parametrized 
by a) of the form 




(7) 
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where (a,ij}(x)) = Ylk=i a fcV , fc( x ) an d Z(a) = f sn exp(—(o£,ift(x)))cbc. The 
exponential family is defined only for the set 



A = {a G K\Z(a) < oo}. 



(8) 



Suppose that we are given a collection of N independent samples of an n- 
dimensional random vector x. In general, we do not know which density 
the samples are drawn from. There are many examples in practical applica- 
tions where the random vector comes from a exponential density (see |10|). 
However, even if we do not know that the samples are drawn from an expo- 
nential density, we can try to fit an exponential density to the samples. This 
will become clearer when we formulate the moment-matching problem and 
exploit some properties of the exponential densities. The basic idea behind 
the algorithm we present here is to estimate the unknown parameter vector 
a by maximizing the likelihood function of the samples. For a collection of 
N independent samples of the random vector x, the likelihood function L is 
defined as (see e.g. [TT] ) 



where p(xj,a) is the unknown exponential density whose parameters a we 
wish to determine. We associate a potential function ip^, k = 1, . . . ,1 with 
every parameter a/%. Maximization of L with respect to the parameters otj~ 
produces an estimate a for a. Under suitable regularity conditions, the se- 
quence of estimates a for increasing values of N is asymptotically efficient 
and tends, with probability one, to a local maximum in parameter space. 
From now on we will use the notation a instead of a to denote the maxi- 
mum likelihood estimate of the parameters keeping in mind that this is only 
an estimate of the parameters. In addition, we will be working with the 
logarithm of the likelihood logL, since it does not alter the position of the 
maximum and also leads to formulas that are more easily manipulated. Dif- 
ferentiation of log L with respect to the and setting the derivative equal 
to zero results in 



N 




(9) 



3=1 




(10) 



where 



J S n ^fc(x)exp(-(q,'0(x)))dx 
J S n exp(-(a, ip(x)))dx 



(11) 
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is the expectation value of the function ipk with respect to the density p(x, a). 
The right side of (|10p is the average (moment) of the function i/j^ as calcu- 
lated from the given samples. The I equations in (jlOp define the moment- 
matching problem. What we want to do is to estimate the parameters a 
so that the conditions in (|lUp are satisfied. The question of whether such a 
problem has a solution, and if it does whether it is unique can be addressed 
through convex analysis (see [18] for more details). 

Now that we have defined the moment-matching problem we have to find 
a way to actually estimate the parameter vector a. The equations (fTUj) con- 
tain, in general, nonlinear functions of the parameters. Moreover, except for 
very special cases, these nonlinear functions are unknown or very difficult to 
manipulate analytically. Thus, we have to tackle the problem of estimating 
the parameter vector numerically. We can define the Z-dimensional vector 
f («) = (fl( a ), /2(a), • • • , //(«)) as 

1 - 

f k (a) = E a [ip k (x.)}--J2Mxj), k = l,...,l (12) 

3=1 

The moment-matching problem amounts to solving the system of (nonlinear) 
equations fk(a) = 0, k = 1, . . . , I. 

3.1.1 The Levenberg-Marquardt algorithm 

Two popular candidates to solve the system of nonlinear equations fk(ce) = 
0,k = are the method of steepest descent and Newton's method. 

However, both have their drawbacks. The method of steepest descent con- 
verges but can have very slow convergence, while Newton's method converges 
quadratically but it diverges if the initial guess of the solution is not good. 
We choose to solve the moment-matching problem as an optimization prob- 
lem using the Levenberg-Marquardt (LM) algorithm (see e.g. [16J). This is 
a powerful iterative optimization algorithm that combines the advantages 
of the method of steepest descent and Newton's method. First, let us write 
the moment-matching problem as an optimization problem. Define the error 
function e(a) as 

I l 

<o t ) = -Y J 4 = -Y J fl^), (13) 

fe=l fe=l 

where et = fk( a )- The problem of minimizing e(a) is equivalent to solving 
the system of equations fk(a) = 0, k = 1, . . . , I i.e. the zeros of e are solutions 
of the system fk(a) = 0,k = 1, . . . , I and vice versa. The LM algorithm 
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uses a positive parameter A to control convergence and the updates of the 
parameters at step m + 1 are calculated through the formula 

[J T J + Xdiag{J T J)](a (m+1) - a (m) ) = -J T f(a (m) ), (14) 

where J = §§:\ a=a (m),i,j = I,..., I is the Jacobian of f(a (m < ) ) and J T 

its transpose. The matrix diag(J T J) is a diagonal matrix whose diagonal 
elements are the diagonal elements of (J T J). In the literature, the name 
Levenberg-Marquardt is also used to denote the algorithm in (|14p with 
diag(J T J) replaced by the unit matrix /. For the case where we use the 
unit matrix / instead of diag(J T J), it is straightforward to see the connec- 
tion with the methods of steepest descent and Newton's. For A = the 
algorithm reduces to Newton's method, while for very large A we recover 
the steepest descent method. The modification (due to Marquardt) of using 
diag(J T J) becomes important in the case where A is large. In this case if 
we only used the unit matrix / almost all information coming from (J T J) is 
lost. On the other hand, since (J T J) provides information about the curva- 
ture of e, use of the matrix diag(J T J) allows us to incorporate information 
about the curvature even in cases with large A. 

We have to prescribe a way of computing the Jacobian J(a^ m -*). The 
element J™ of the Jacobian is given by 

J,M m) ) = [^(x)^(x)] - E a(m) Mx)]E alm) [^(x)]) (15) 

for i,j = 1, . . . , I (note that the Jacobian is symmetric) So, all the quanti- 
ties involved in equation (fTl"|) can be expressed as expectation values with 
respect to the m-th step parameter estimate cc- m \ More details about the 
implementation of the LM algorithm can be found in [16] . 

If one can compute only Monte Carlo estimates of the necessary expec- 
tation values appearing in the LM algorithm, the question of convergence of 
the algorithm becomes more involved. Starting with the work of Robbins- 
Monro and Kiefer-Wolfowitz [13], there has been a vast literature on the 
subject of constructing convergent stochastic algorithms to obtain the zeros 
of a function for which we only have noisy observations. The optimization 
problem we are interested in fits in this category since we are looking for the 
zeros of the error function e(a) for which we can only compute Monte Carlo 
estimates. It is not generally true that replacing the expectation values ap- 
pearing in LM (or Newton's method) with Monte Carlo estimates will lead 
to a stochastic algorithm that converges to the true zero of e(a) (see e.g. 
the discussion in [7]). Convergence can be achieved if the number of Monte 
Carlo samples is increasing with iterations. In the limit of infinite number 
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of samples, one will recover the deterministic LM algorithm. In particular, 
one can conceive of a scheme to increase the number of Monte Carlo samples 
whenever the error value becomes of the order of magnitude of the Monte 
Carlo error. An alternative scheme to increase the number of Monte Carlo 
samples was proposed in [7|. On the other hand, if one can afford to have 
a large number of Monte Carlo samples for all iterations, the LM algorithm 
will reduce the error down to a value that is for all practical purposes zero 
(since the associated Monte Carlo error will be practically zero). We are not 
advocating here the use of the stochastic version of the LM algorithm for 
general problems. But, given the superiority of the deterministic LM algo- 
rithm for problems of moderate dimension like the ones we will be examining 
in Section 0] and the fact that we can afford a large number of Monte Carlo 
samples for all iterations, makes the stochastic LM algorithm a reasonable 
choice. We should also mention that what we refer to as a stochastic LM 
algorithm is different from the stochastic LM algorithm proposed by LeCun 

M- 

3.2 Reformulation of the density estimation optimization prob- 
lem as a filtering problem 

For most cases, the expectation values involved in maximum likelihood esti- 
mation can only be computed through Monte Carlo sampling. This renders 
random both the update equation (I14|) for the coefficients and the equation 
(|13p for the error. In addition, as mentioned above, maximum likelihood esti- 
mation is guaranteed to yield a local maximum in parameter space. Finding 
a global maximum likelihood estimate is equivalent to finding a global min- 
imum, zero in this case, for the error function e(a). This requires running 
the LM algorithm with different initial conditions and picking the best so- 
lution, i.e. the solution with the minimum value for the error function e(a). 
The combination of randomness in the update and error equations and the 
need to find a global maximum of the likelihood (in the parameter space), 
means that the problem of global maximum likelihood estimation is equiva- 
lent to a stochastic global optimization problem. We will apply the filtering 
reformulation of the previous section in order to solve this stochastic global 
optimization problem. 

In order to reformulate the stochastic global optimization problem as a 
particle filter (see reformulation at the end of Section f2.ip we need to specify 
the state density /i(a^ t+1 ^ \%f^), the objective function H s (x^^), as well as 
the observation density g(y^ t+l ^ ) , where j = 1,...,M. To conform 
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with the notation used above for exponential densities, this means that we 
need to specify h(a^ +1 ^ \cx~p ) , H s (a^^) and g(y <yt+l ' > W^ 1 ^). 

The state density /i(o^* +1 ^ \oCj ) is defined implicitly through the update 
equation (HU). Since the expectation values appearing in (JT3|) are estimated 
through Monte Carlo sampling, Eq. (|14|) becomes random. In particular, the 

Jacobian Jj(a^) for the j-th particle at iteration t is replaced by its Monte 
Carlo estimate, say with Ne samples, Jj(aj ). Similarly, the moment match- 
ing (error) vector f(a^) is replaced by its Monte Carlo estimate f(a^). For 
both Monte Carlo estimates the associated sampling error is 0(1/\/Ne) |15j . 

For the objective function H s (a^^) we define H s (a^^) = ( — — -) 1 / 2 , 
where e(a^* +1 ^) is the Monte Carlo estimate of the error e(a^* +1 ' ) ) of the j-th 

the j-th particle. The sampling error for ( p ) 1//2 is also 0(1/ V 'Ne) 



particle. The quantity ( p ) ' is the average error per moment for 

2 *< a *i h i/2 
I ) 

We have to specify also the observation density g(y <yt+1 ' > Recall 
that we have defined the observation process y( fc+1 ) as = H s (ai k+1 ^) + 

where ai k+1 ^ is the predicted state before the observation. We pick 
the random variable to be distributed as a Gaussian variable of mean 

zero and variance 0(1/Ne)- This means that we can write the conditional 
observation density value for the j-th particle as 

9{V |0 - ) = V^ eM } ' 

where a 2 = Also, as discussed at the end of Section |2"7T] we pick the 

value of the observation = min HAa^^). 

j=i,...,M * J 

With the above choices, the particle filter algorithm becomes: 

Particle filter algorithm for stochastic global optimization with 
the LM method 

1. Draw samples (particles) ct\ , . . . , from an initial density /j,o(x). 
Set A(°) = A and t = 0. 



2. Use [Jj T Jj + \® diag(Jj T Jj)](a^^ —a^) = —Jj T f(a^) to compute 
(t+i) 



samples a\- for j = 1, . . . , M 
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3. Compute H s {a { ^ 1] ) = ( 2 * e(a ^ + for j = . . . 5 JVf . Determine 



»(t+l) 
A i ■ 

4. Set = min # s (ai* +1) ). 

i=l,...,M *■? 



(t+i)^ _ i ( fa (t+1) -^(< +1) )) 2 



5. Compute the weights w*{a\- ) = -^== exp( , )<n 
where a 2 = for j = 1, . . . , M. 

6. Compute the normalized weights u> J (a^ +1 ' ) ) = -^j 2 — -, for j = 1, . . . , M. 

7. Choose the estimate Xt+i = arg max^Wj^a;^ 1 ^). Equivalently, we can 

choose x++i = arg min H B (oftj). 

j=l,...,M V *■> ' 

8. If t + 1 is equal to a maximum allowed number of iterations t max or 
y( t+l } satisfies a stopping criterion, terminate the algorithm. Else, 
proceed to next step. 

9. Resample from {or^ l \ . . . , a^jj ) with probability proportional to 

Wj, j = 1, . . . , M to produce new samples (q 1 * +1 ' ) , . . . , a^ 1 ^) at time 
t + 1. 

10. Set t = t + 1 and proceed to Step 2. 

The algorithm described above allows one to allocate more particles in 
areas of the parameter space which seem promising in an optimization sense. 
A more careful inspection of the way the algorithm works reveals that there 
are two issues that can prevent the algorithm above to improve on the naive 
algorithm. 

First, the particle filter algorithm can converge fast to a local minimum 
that is not the best minimum that the optimization algorithm could have 
reached starting from a set of initial conditions. This phenomenon, called 
premature convergence, is well-known in the stochastic optimization litera- 
ture, especially in the context of genetic algorithms [6]. To be more precise, 
it is possible that early on in the iteration process one particle can domi- 
nate in the resampling step, because it appears to perform better initially. 
However, there may be other particles that would have performed better if 
we allowed them to evolve for more iterations. Such particles can vanish 
during the first few resampling steps because they have low weights. One 
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way to avoid premature convergence in the context of the particle filter is to 
allow a few iterations without enforcing the filtering and resampling step so 
that the different particles can realize their potential to reach a minimum. 
After those iterations, the filtering and resampling steps can allocate more 
particles in the areas of the parameter space which are more promising. 

Second, in the particle filter algorithm above, all the offspring particles 
are assigned the value of the LM parameter A of their parent. This means 
that all the offspring particles will evolve to the same new parameter vector. 
The only difference between the offspring particles will be with regards to 
their error value at the next step. This difference comes from the randomness 
in the Monte Carlo sampling. This may not be enough to give the particle 
filter an appreciable advantage as far as the convergence speed is concerned. 

In our numerical studies, which will be presented in Section HI we did 
not encounter the problem of premature convergence. However, as will be 
shown, the particle filter in its generic form did not achieve consistently 
a speedup of the convergence compared to the naive global optimization 
algorithm. Recall that the main advantage of the particle filter algorithm 
is the flexibility provided by the filtering and resampling steps. We exploit 
this flexibility to construct a modified particle filter algorithm which can 
increase appreciably the speed of convergence. In some cases, the modified 
particle filter also achieved a smaller value of the error (compared to the 
naive algorithm) for a fixed maximum number of iterations. 

3.2.1 Modified particle filter algorithm 

To motivate the modified particle filter algorithm we have to examine more 
carefully the way that the LM algorithm works. As mentioned in Section 
13.1. 1\ the LM algorithm is a hybrid of the steepest descent method and 
Newton's method. In particular, depending on the value of the parameter 
A, the LM algorithm can be brought closer to steepest descent or Newton's 
method. For large values of A it is close to the steepest descent method, while 
for A = it reduces to Newton's method (assuming that the Jacobian is not 
singular). While the steepest descent method has guaranteed convergence 
the rate of convergence can be slow. On the other hand, Newton's method 
has a higher speed of convergence but whether it will converge or not is 
sensitive on the choice of initial conditions. 

The resampling step of the particle filter produces more copies (offspring) 
of a good, in an optimization sense, particle. The offspring particles inherit 
all the properties of the parent particle and, in particular, the same value 
for the parameter A, say Xp. However, nothing prevents us from assigning 
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different values of A to the different offspring of the same particle. After 
all, A is an adaptive parameter of the algorithm which should be chosen 
so as to accelerate convergence. The resampling step of the particle filter 
allows us to batch the offspring of a particle according to their ancestry. 
Then, we can assign different values to the offspring within a batch. Let 
(af +1 \ . . . ,ci!£ +1 ^) be the batch of offspring of a good particle. We can 
assign to the i-th element of this batch value \f +1 \ where i = 1,...,B. 
A value \f +1) > X P means that the LM algorithm for the i-th offspring 
particle will behave closer to the steepest descent method than the parent 
particle. Similarly, a value A^* +1 ^ < Ap means that the LM algorithm for 
the i-th offspring particle will behave closer to Newton's method than the 
parent particle. Since our goal is to accelerate convergence, i.e. bring the 
LM algorithm closer to Newton's method, we chose to assign the values of A 
to the offspring within a batch in the following manner: A^* +1 ^ = Ap/7* -1 , 

with 7 > 1. This means that xf +1 ^ = Ap and A^ +1 -* = Ap/7 B_1 . For batches 
containing many offspring (large B), this procedure allows us to explore 
fast the neighborhood of a good particle. Note that such an exploration is 
impossible in the context of the naive global optimization algorithm without 
increasing tremendously the cost of the algorithm. On the other hand, the 
cost of this exploration is negligible in the particle filter framework. A 
good choice for 7 can be found based on knowledge of how the parameter A 

influences the behavior of the LM algorithm, in particular the determinant 

- T ~ (t) ~ T - 

and condition number of the matrix Jj Jj + A^ diag{Jj Jj). The quality 
of the results appears to be pretty robust to the value of 7. Also, it is 
encouraging that for all the numerical examples we could use the same value 
of 7 and obtain equally good results. 



4 Numerical results 

We have applied the particle filter algorithm (both in its generic and mod- 
ified forms) to four examples of varying difficulty. The first two examples 
involve the estimation of the parameters of a known exponential density. 
The last two involve the estimation of the parameters of an exponential 
density so as to match certain moments of an unknown density. 

For all four examples the exponential density whose parameters are to 
be estimated is given by 

p(x, a ) = ^#^, 
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where a = (a±, . . . , 024) and x = (x%, . . . , x±). The potential function vector 
•(/>(x) is defined as 

f/>l-4(x) =Xi, for i = 1, . . . , 4 

V'5-i4(x) =XiXj, for i, j = 1, . . . , 4 and j > i (16) 
^15-24 (x) =xjx 2 j , for i, j = 1, . . . ,4 and j > i 

In addition, to ensure the integrability of p(x, a), we enforce «5, ag, «i2, au 
and CK15, . . . , 024 to be nonnegative. 



4.1 Independent variables with known density 

For the first example, we chose a§ = ag = a±2 = ai4 = 0.5, ais = aig = 
0:22 = 024 = 1 and the rest of coefficients were set equal to zero. With this 
choice p(x) = JliLi exp(— 0.5x^ — xj)/Z where Z = Yl i=i exp(— 0.5x| — 
xf)dxi is the normalization constant. We used N = 10 6 samples of this 
density to compute the moments Tj = Ylf=i ^fc( x i) f° r & = 1, ■■-,24. 
Before computing the moments, we normalized the samples by their em- 
pirical mean % = 1, . . . , 4, and standard deviation, cxj, i = 1, . . . , 4, i.e. 
Xij — > (xij — fj,i)/(Ji, for j = 1,...,N. The goal was to estimate the pa- 
rameters of an exponential density so that they reproduced the moments 
Tj. For the naive global optimization, the generic particle filter optimization 
algorithm and the modified particle filter optimization algorithm we used 
Ne = 10 4 samples to estimate all the necessary expectation values. The 
number of particles was set to M = 100. The initial condition for the j-th. 
particle (J = 1, . . . , M) was chosen as 

45 = -5(1 - Vjl), = - 5 (! - Vj2), 

a fl5 = 1 - - 5r /j5, 4l9 = 1 ~ - 5r ?i6 5 

a % = 1 - - 5r ?i7, aJ-24 = 1 - - 5r ?i8, 

where rjj\, . . . , 77^-3 are independent random variables uniformly distributed 
in [0, 1). The rest of the components of a are set initially to zero. 

For the modified particle filter algorithm, we set the parameter 7, which 
determines the value of A for the different particles in a batch, to the value 
7 = 1.1 (see discussion at the end of Section [3.2.1j) . We note that the same 
value of the parameter 7 was used for all four examples with equal success. 
The choice for 7 was not optimized, i.e., it was not found by trial and error. 
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5 10 15 20 

Iterations 

Figure 1: First example: Independent variables with known density (see text 
for details). Evolution of the average error per moment as a function of LM 
iterations. No PF corresponds to the naive global optimization algorithm, 
PF corresponds to the generic particle filter optimization algorithm and PFB 
to the modified particle filter optimization algorithm. 

As stated before, it can be chosen based on knowledge of how the parameter 
A influences the behavior of the LM algorithm, in particular the determinant 

~ T ~ (t) ~ T ~ 

and condition number of the matrix Jj Jj + Xj diag(Jj Jj). 

The most severe test for the particle filter and modified particle filter 
algorithms is to compare their error to the error of the naive global op- 
timization algorithm when all three algorithms are started from identical 
initial conditions for the particles. In addition, to eliminate possible dis- 
crepancies in the algorithms' behavior due only to the variability inherent 
in Monte Carlo sampling we performed for each algorithm 10 different ex- 
periments, each one with 100 particles, with identical initial conditions and 
averaged the results over the 10 experiments. Figured] presents the evolution 
of the average error per moment ( 2 *2^ ) 1//2 as a function of LM iterations 
for the three algorithms. The error bars denote the standard deviation of 
the average over the 10 experiments. It is obvious from the error bars that 
there is not much variability between the different experiments. After a few 
iterations the standard deviation becomes an order of magnitude smaller 
than the average. 

While all three algorithms converge to approximately the same value for 
the error, which is practically equal to the Monte Carlo error, the speed at 
which they do so is very different. The naive algorithm converges faster than 
the generic particle filter algorithm. As we discussed before (end of Section 
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13, 2p . this can happen. The modified particle filter algorithm significantly 
outperforms both the naive algorithm and the generic particle filter. In 
particular, the modified algorithm has practically converged by iteration 5 
while the other two algorithms need about 15 iterations to reduce the error 
to the same value. This is an increase in the convergence speed by more 
than 60%. 

It is important to discuss the computational cost at which this conver- 
gence speedup is achieved. The difference between both versions of the 
particle filter algorithm and the naive algorithm is the addition of the filter- 
ing and resampling steps. However, this cost is negligible compared to the 
Monte Carlo sampling computational cost which is needed to setup the LM 
algorithm and compute the error at each step. For all the examples studied 
here, the computational cost to perform a fixed number of iterations with 
the modified particle filter algorithm is about 2% more than the cost of the 
naive algorithm. Given the increase of the convergence speed this extra cost 
is well worthwhile. In particular, the extra cost of the modified particle fil- 
ter algorithm is less than the cost of adding 2% more particles in the naive 
algorithm. For this example this would mean adding 2 more particles to the 
naive algorithm. 

4.2 Dependent variables with known density 

For the second example, we chose 05 = ag = ol\i = Q14 = 0.5, 015 = 
. . . = 0:24 = 1 and the rest of coefficients were set equal to zero. Thus, the 
random variables x\, . . . , X4 are dependent. Figure [2] presents the evolution 
of the average error per moment ( 2 *2^ ) 1//2 as a function of LM iterations 
for the three algorithms. The naive algorithm and the generic particle filter 
have comparable behavior while the modified particle algorithm outperforms 
both of them significantly. As in the first example, the error for the modified 
particle algorithm has practically converged by iteration 5 to a value com- 
parable to the Monte Carlo sampling error while the other two algorithms 
need almost 20 iterations to do so. Thus, the convergence speedup of the 
modified particle algorithm is more than 70%. 

4.3 Sinusoidal signal with additive noise 

For the third and fourth examples we used the algorithms to estimate an ex- 
ponential density that reproduces the moments Tj = Y2j=i ^fc(xj) for k = 
1, ... ,24 of an unknown density. The examples are motivated by training 
neural networks to represent time series pQ. We will examine two cases 



21 




5 10 15 20 

Iterations 



Figure 2: Second example: Dependent variables with known density (see 
text for details). Evolution of the average error per moment as a function 
of LM iterations. No PF corresponds to the naive global optimization algo- 
rithm, PF corresponds to the generic particle filter optimization algorithm 
and PFB to the modified particle filter optimization algorithm. 

of a signal corrupted by noise. In the third example we suppose that we 
have samples from a signal u(t) = sin(27rt) + rjt where rjt is Gaussian white 
noise. At any instant t, the noise rjt ~ N(0,1). In the fourth example we 
suppose that we have samples from a signal u(t) = sin(27r(i + r])) where 
7] ~ iV(0, 0.1), i.e. a signal with a random phase. For both cases we assume 
that the signal is given for t £ [—1,1]. We expand the signal in Legendre 
polynomials (which are orthogonal in [-1,1]) and keep only the first 4 terms 
in the expansion. Since the signal is random, the coefficients of the expan- 
sion are random. The random variables x\,...,X4 are the coefficients of 
the first 4 Legendre polynomials. Exactly because the signal is a random 
function, we do not expect the expansion in Legendre polynomials to be 
an accurate one. The coefficients of the expansion are expected to fluctu- 
ate considerably from sample to sample. However, our purpose is to see 
how well an exponential density can represent the unknown density of the 
Legendre expansion coefficients. 

The parameters in the implementation of all three algorithms are the 
same as in the first two examples, except for the initial conditions. The 
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Figure 3: Third example: Sinusoidal signal with additive noise (see text for 
details). Evolution of the average error per moment as a function of LM 
iterations. No PF corresponds to the naive global optimization algorithm, 
PF corresponds to the generic particle filter optimization algorithm and PFB 
to the modified particle filter optimization algorithm. 

initial condition for the j-th particle (J = 1, . . . , M) was chosen as 



(0) 


= 1 




(0) 
«i9 


= 1 


~Vj2 




= 1 


~ Vj3, 


a (0) 


= 1 


-Vji 


a (0) 


= 1 


~ Vj5, 


a (0) 


= 1 


-Vj6 


(0) 
a j22 


= 1 


~ Vj7, 


(0) 
a j24 


= 1 


~Vj8 



where rjji, . . . ,rjjs are independent random variables uniformly distributed 
in [0, 1). The rest of the components of a are set initially to zero. 

Figure [3] presents the evolution of the average error per moment (^j^) 1 / 2 
as a function of LM iterations for the three algorithms. We see that the naive 
algorithm and the generic particle filter algorithm have comparable behav- 
ior. However, after 20 iterations, both of them have reduced the error to a 
value that is still about 30% larger than the error of the modified particle 
filter algorithm. The modified particle filter algorithm reduces the error to 
about 1.5 times the Monte Carlo error. 

4.4 Sinusoidal signal with random phase 

As mentioned in the preceding section, we suppose that we have samples 
from a random phase signal u(t) = sin(2-7r(i + rf)) where rj ~ iV(0, 0.1), with 
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Figure 4: Fourth example: Sinusoidal signal with random phase (see text 
for details). Evolution of the average error per moment as a function of LM 
iterations. No PF corresponds to the naive global optimization algorithm, 
PF corresponds to the generic particle filter optimization algorithm and PFB 
to the modified particle filter optimization algorithm. 



t € [—1,1]. As before, we expand the signal in Legendre polynomials and 
keep only the first 4 terms in the expansion. Since the signal is random, the 
coefficients of the expansion are random. The random variables xi,...,X4 
are the coefficients of the first 4 Legendre polynomials. The initial conditions 
for the particles are assigned in the same manner as in the third example. 

The fourth example is quite more challenging than the third example. 
In particular, dependencies among the coefficients of the expansion, i.e. the 
random variables x%, . . . ,x±, render the Jacobian J singular. This means 
that the optimization problem admits nonzero solutions for the error vector 
f k (a) = E a [ip k (x)) - i Y$Li ^fc(xj), k = 1, . . . , 24. Even though the ma- 

trix Jj Jj + Xj 'diag(Jj Jj) used in the calculation of the parameter vector 
increment is regularized, the best one can hope for is to keep reducing the 
error until the Jacobian becomes zero to within the arithmetic precision used 
(10~ 16 in our case). 

Figure H] presents the evolution of the average error per moment ( 2 * e ^ ) 1 / 2 
as a function of LM iterations for the three algorithms. The modified par- 
ticle filter algorithm has practically converged by the 12th iteration with 
an error value that is about 30% less than the error value achieved by the 
naive algorithm and the generic particle filter. However, the error value of 
the modified particle filter is still about 8 times larger than the Monte Carlo 



24 



error. This is because, as we mentioned in the previous paragraph, this op- 
timization problem admits nonzero solutions for the error vector f(a). Still, 
the convergence speedup of the modified particle filter algorithm for this 
example is about 40%. 

5 Discussion 

We have presented a reformulation of stochastic global optimization as a 
filtering problem. In particular we have reformulated stochastic global op- 
timization using a particle filter. This choice was based on the simplicity 
of implementation and flexibility of particle filters. We have exploited this 
flexibility to construct a modified particle filter filter that converges faster 
than naive global optimization. We have demonstrated the efficiency of the 
approach with several examples of varying difficulty. 

The flexibility allowed by the particle filter can be used to construct ad- 
ditional modified particle filters. For example, we can use ranking selection 
[6] in the resampling step instead of proportionate selection. The advantage 
of ranking selection compared to proportionate selection is that it establishes 
a constant pressure of selecting the particle with the largest weight. This 
can be helpful when there exist several particles with almost equally large 
weights. In this case, proportionate selection may not be able (for a finite 
number of particles) to sample the particle with the largest weight. Another 
possible modification of the particle filter to enhance the exploration of the 
parameter space can come from the use of recombination procedures between 
particles after the resampling step has been performed. The motivation for 
such a procedure comes from genetic algorithms [6], where recombination 
constitutes probably the most important feature of such algorithms. How- 
ever, there is no unique way to perform recombination and this can also be 
the weak point of genetic algorithms. Another modification of the parti- 
cle filter that we have already mentioned is to allow the particles to evolve 
for a few iterations without enforcing the observation and resampling steps. 
This allows the particles to exhibit their potential as far as locating a local 
minimum is concerned. After the first few iterations we can start enforcing 
the observation and resampling steps to allocate more particles in the more 
promising areas of the parameter space. Such a modification can help avoid 
the potential problem of premature convergence (see also the discussion in 
Section f3.2|) . Finally, we note that the particle filter algorithm can be used 
also to randomize a deterministic global optimization algorithm [20J. 

We hope that the algorithm proposed in this work will help in tackling 
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the multitude of optimization problems originating from real-world applica- 
tions. 
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